Why fish hydroacoustics?

  • Non-invasive monitoring
  • Management: distinguish LT vs SMB
  • Question: can the FRC (45–170 kHz) classify species?
  • Outcome: scalable pipeline for species ID & trends

Animated sonar pulses bouncing off a fish

About the dataset

  • ~30k rows × 302 variables; two species (LT, SMB)
  • Processed in Echoview
  • Signals: F45–F170 (Frequency Response Curve)
  • Context: morphometrics, depth, speed, orientation
  • Focus today: frequency data only for classification

Acoustic fingerprints: the Frequency Response Curve (FRC)

  • Different echoes (45–170 kHz) → the FRC
  • Species show distinct curve shapes
  • Hypothesis: FRC alone can separate LT vs SMB
Figure 1

Which frequencies separate species?

  • Compare LT vs SMB at each frequency (standardised difference)
  • Peaks indicate highly discriminative frequencies

Figure 2

From curves to models

  • Each fish’s FRC (45–170 kHz) summarised
    • Quantiles (q20–q100) & Median
  • feasts features capture curve shape (ACF/PACF/STL) to encode smoothness, periodicity, and slope changes.
  • H2O AutoML across Stack ensemble/GBM / DL / XGBoost
  • Grouped CV by fish, test on held-out set
pipeline raw 1) Raw Sonar F45–F170 per ping agg 2) Per-fish Aggregation Quantiles (q20–q100) / Median raw->agg per fish feats 3) feasts Features ACF / PACF / STL (shape) agg->feats shape descriptors note1 Grouped CV by fishNum (60/20/20) agg->note1 model 4) H2O AutoML GBM / Deep Learning / XGBoost feats->model features → classifier note2 Thresholds policy & OOF clamp [0.40–0.70] model->note2

Baselines: RNN & per-ping AutoML

Pre-processing (used by the RNN):

  • Size-standardise each fish to 450 mm with offset_dB = 10*log10(450/length)

  • Convert F45–F170 from dB → linear backscatter: exp((dB + offset_dB)/10)

  • Grouped splits by fishNum (train / valid / test)

Baselines on TEST set
Model Acc @ 0.50 Policy thr Acc @ Policy
RNN (reproduction on 5-ping blocks) 0.593
AutoML (per-ping, original data) 0.668 0.7000 0.663

TS-features: AutoML (per-fish)

Four variants

  • QUINTILES_ALLFREQ (5 rows/fish: F* + tsfeatures)
  • QUINTILES_FEATS (5 rows/fish: tsfeatures only)
  • MEDIAN_ALLFREQ (1 row/fish: median F* + tsfeatures)
  • MEDIAN_FEATS (1 row/fish: tsfeatures only)
TS-features AutoML — TEST results (policy = VALID max-ACC, clamped [0.40, 0.70])
Variant Acc @ 0.50 Policy thr Acc @ Policy
QUINTILES_ALLFREQ 0.883 0.4754 0.867
QUINTILES_FEATS 0.633 0.4000 0.750
MEDIAN_ALLFREQ 0.917 0.5796 0.833
MEDIAN_FEATS 0.667 0.4000 0.750

Future Work: Improving Model Performance

  • OOF (Out-of-Fold) Threshold Tuning – to optimise classification thresholds

  • Model Hyperparameter Tuning – for the Deep Learning grid (layers, dropout, epochs, learning rate, etc.).

  • Use discriminative frequencies
    Select F* bands that best separate LT vs SMB.
    → Train compact models on these top F* only to reduce noise & overfitting.

  • Add richer time-series features from fabletools/feasts

Every fish leaves a sonic fingerprint. Our job is to read it.
Fish Hydroacoustics — ETC5543